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' Abstract. Examples of small contingency tables on binary random variables with large in- 

teger programming gaps on the lower bounds of cell entries were constructed by Sullivant. 
We argue here that the margins for which these constructed large gaps occur are rarely en- 
countered, thus reopening the question of whether linear programming is an effective heuristic 
for detecting disclosures when releasing margins of multi-way tables. The notion of rarely 
encountered is made precise through the language of standard pairs. 
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' When governmental bodies like census bureaus collect data there is a tension between 

publicly releasing as much information as possible while at the same time striving to ensure 
that any one individual's private data cannot be discerned from the released information. One 
such case of this is when each piece of private data is recorded in a cell in a multi-dimensional 
contingency table and the released data is a collection of smaller margin tables (or simply 
■ margins) which portray the interactions between some subsets of the variables. These margins 

are simply higher dimensional analogues of row sums and column sums on the contingency 
. table. So how would we know that a person's individual data is protected despite the release 

| of possibly many margins ? 

There are many criteria that may have to be accounted for in limiting the disclosure of 
private data: see for example [7] on issues with small cell counts in sparse contingency tables, 
or see [9] for privacy concerns in genetic databases (see [21 Page 5] for an introductory dis- 
cussion). One such measure is the practical difficulty in attaining bounds for cell entries that 



can be discerned from the released margins. One way in which these bounds can be found is 



from standard methods in integer programming but solving the general integer program has a 
theoretical complexity of NP-complete [144 §18.1] and, practically speaking, are very challeng- 
ing to solve. On the other hand, linear programs can be solved in polynomial time and there 
is significant high-powered software designed specifically to practically solve linear programs. 
So one must know if the linear relaxations of the integer programs associated to disclosure 
limitation are always good approximations for bounding cells in contingency tables. 

Based on theoretical results for 2-way tables and practical experience on higher dimensional 
contingency tables, it was thought ([3], [5]) that the linear approximation was always reliable for 
bounding cells. However, using Grobner basis techniques, Sullivant [T71 Theorem 1] constructed 
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a family of contingency tables on re > 4 binary random variables with a specified collection of 
margins, denoted by A n , such that the gap between the linear programming approximation 
of the cell bounds and the true integer programming cell bound for one of these margins is 
2 n ~ 3 — 1. Thus, we are not entitled to always assume that the linear approximation of the 
bounds on cell entries is a faithful approximation to the true bounds on these cell entries. 

All that said, perhaps the reason for [3] &i [5] thinking that the linear approximation was 
reliable for bounding cell entries was that, in their practical experience, margins with large 
gaps were never encountered. Instead, perhaps the right claim for the practitioners to make 
is that the linear relaxations of the integer programs associated to disclosure limitation are 
almost always good approximations for bounding cells in contingency tables. In other words, 
while it is the case that Sullivant's specified margins do indeed provide examples of the linear 
approximation being poor, these margins may be rarely encountered in practice. We argue 
here that this is the case for Sullivant's large gaps and thus reopening the question of whether 
linear programming is an effective heuristic for detecting disclosures when releasing margins of 
multi-way tables: 

Theorem 1.1. The margins for Sullivant's (2 n_3 — l)-gaps on the models A n are rare. 

We will make the notion of rare precise through standard pairs. In the sections that follow 
we will first review and fortify Sullivant's construction. Next, after defining standard pairs and 
arguing that they are the right tool to use for measuring rarity, we will prove Theorem 11.11 
using a series of propositions regarding standard pairs specific to the strengthened Sullivant 
construction. We will see that the strengthened construction is not just made for its own sake 
but is crucial to proving Theorem II. 1[ Finally, the detection of the (2 n ~ 3 — l)-gaps are made 
possible through Grobner basis theory and this has been used to provide other examples of 
margins with large gaps [10|. Corollary 4.3]. We will close with computational results showing 
that these other instances of large gaps are also rare in the same sense of Theorem 11.11 

2. An Alternative Construction Of Sullivant's Large Gaps 

Let A be a fixed matrix with N columns and c a real cost vector with N entries. Then 
for every fixed b that's a non-negative integral combination of the columns of A (i.e. b € 
NA) we have the integer program IP^ jC (b) := min{c • u : Avl = b, u £ }. The linear 
relaxation LP/i jC (b) of IP^^b) is simply the same problem with the constraint u € 
replaced by u > and u real. For each fixed b € N^4 we have the quantity gap^ c (b) := 
optimal value ofIPA, c (b) — optimal value ofLP,4 )C (b) and the maximum of all these is the 
integer programming gap [TO] gap c (A) = maxb S NA{gaPA c( D )} 

We will be especially interested in scenarios where c := (1, 0)(= ei € M. N ) but even in this 
case |10l §4] computing the gap precisely can be challenging. However, using Grobner basis 
theory, we have the following proposition which is a quick way to get a lower bound on the 
integer programming gap. We repeat its proof here as we need its content for later results. 
From here on we replace gap ei (^4) by simply gap„(A). 

Proposition 2.1. \10\ Corollary 4.3] Let A be an integer matrix, let e± be a cost vector and 
let y be any term order. Suppose g := u — v is a reduced Grobner basis element of A (with 
respect to the weight order induced by e\ and y ) with u\ = a > 2 Then gap_(^4) > a — 1. 
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Proof, (this proof due to Seth Sullivant) By our choice of g, u is a non-optimal solution for 
the integer program with constraint vector Au. Now consider the vector u — ei where ei is 
the first standard unit vector. By the reducedness of g, u — ei must be an optimal solution for 
the integer program min{wi : Aw = A(u — e\), w integral}. On the other hand, the vector 
w* := (u — ei) — (u — v) is a solution to the linear relaxation of the integer program with 
cost equal to zero, and so it must be an optimal solution to the linear relaxation of the above 
program. Thus, this gives an instance showing the gap is greater than or equal to a — 1. □ 

We can rephrase Proposition 12.11 as saying that gap j4ei (b) = a — 1 where b = A(u — ei). 
Note too, by the same argument, u — (a — ff)e\ is an optimal integer solution for the integer 
program whose linear relaxation has an optimal solution of (u — (a — 0)&i) — ^(u — v). 

Corollary 2.2. With the hypothesis of Proposition [Ol for every integer 1 < (3 < a — 1, there 
exists a b such that gap J 4 ei (b) = (3. 

We will be interested in scenarios like the figure below. The problem of bounding cell entries 
is precisely that of placing lower &; upper bounds on, without loss of generality, the entry ttooo 
given that all the Ui's are non-negative and integral and that they sum in a manner described 
by the margins below. The linear relaxation of this problem is approximating the bounds 
by permitting the u^s to be real valued and bounding the entry «ooo accordingly. We will 
focus on the discrepancy between the true lower bound and its linear approximation. In what 
follows we will review how the problem of finding the minimum value of the uq cell entry given 
the S margins b is equivalent to solving an integer program IPyi(5),(i.o)(t ) ) with the linear 
approximation being the linear relaxation of this program. The largest possible discrepancy 
for S is precisely the integer programming gap_(5). 




These margins can be described formally as follows: A hierarchical model is a simplicial com- 
plex S on a ground set [n] = {1, 2, . . . , n} together with an integer vector d = (d\, d?, ■ ■ ■ , d n ). 
The quantity n will be the dimension of our multiway contingency table and the di is the 
number of levels in the i th direction of the table. Every facet F of S indicates a margin to be 
released and we can always construct a matrix A(S) to describe these margins. From here on 
we will assume that our contingency tables are binary (d± = di = ■ ■ ■ = d n = 2). See [TT] for 
further discussion. 

Using Proposition 12.11 Sullivant |17^ Theorem 8] showed that for every n > 4 there is a 
model A n with margin b such that gap_(A n ) > 2 n ~ 3 — 1. Sullivant constructs a Graver basis 
element f n with 0-th entry equal to 2 n_3 for A(A n ) and, because of the algebraic interpretation 
of A n could then claim that this Graver basis element is part of a reduced Grobner basis. 
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In the remainder of this section we will show something a little stronger while simultaneously 
giving a slightly easier proof of |17^ Theorem 8]: the constructed element f n is in fact a circuit 
(a minimal linear dependence) for A(A n ). While the proof is similar to that of [171 Theorem 
8] it has the added benefit of avoiding the difficult primitive condition needed for Graver basis 
elements and, at the same time, showing that Sullivant's construction is really a modified 
version of the well known checkerboard vector. More importantly, we will need the stronger 
circuit property in the next section where we prove Theorem ll.il Let's now build up Sullivant's 
construction. 

Example 2.3. Suppose we have a 2 x 2 x 2 contingency table with released margins specified 
by the simplicial complex B = {{1, 2}, {1,3}, {2, 3}} (see Figure). If we replace the cell entries 
in the 3-dimensional binary table as follows: u = (mooch ^ooii ^oicb ^on, ttioO) ttioi) "noi^iii) = 
(5, 3, 8, 8, 6, 5, 10, 7) we then get the values in the released margins as specified in the figure. 

Another way of saying this is as follows: The computation of these margins is equivalent to 
the computation of A(B)u = (&oo+> &oi+ ; &io+j bo+o, • • • , &+n)- where the columns of the 
0/1-matrix A(B) have indices i which are ordered lexicographically (0,0,0), (0,0,1), (0,1,0), 
(0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1). In turn, the columns of A(B) are labelled e ; in 
order. For each face F of B we have a row matrix with 2' F ' rows and 2 n = 2 3 columns: one 
row for each k £ {0, 1}^ with the i th entry in that row being equal to 1 (and otherwise) if 
and only if i restricted to F, ip equals k. For example, the row matrix for {1,2} of B is 

1 1 1 i {1>2 } = (0,0) 

00110000 i {1 ^ 2} = (0, 1) 

1 1 i {1 | 2 } = (l,0) 

1 1 J i {1 ' 2} = (1,1) 

This proof of A(B) being the matrix that describes precisely the margins of S can be seen in 
Eqns. (3) & (4)] and holds for any (binary or otherwise) hierarchical model. Also, the 
computational package 4ti2 [1] can be used to compute the matrix of margins for any (binary 
or otherwise) model. 

The binary hierarchical model in the previous example is simply the boundary of the 3- 
simplex. The first step in Sullivant's construction is based on the model B n = {S C [n — 2]}. 
The following lemma is a well known folklore result but we prove it here for the sake of 
completeness: 

Lemma 2.4. The kernel of the matrix A{B n ) is 1- dimensional. Letting e\ be the standard 
unit vector in index position i, the unique basis element (up to scalar multiplication) for this 
kernel is the checkerboard vector checker = Xa-iieven e i — Siiiodd e i 

Proof. From ]\\\ Theorem 2.6] the dimension of the kernel of A{B n ) is exactly the number of 
elements in 2^- n ~ 2 ^ that are not in B n . There is only one such element, [n — 2] itself, hence 
the kernel of A(B n ) has dimension 1. Since every subset of the columns of size 2™" 1 is a 
linearly independent set then every column of A(B n ) must be used non-trivially in the unique 
dependence, say w, of A(B n ). i.e., w\ ^ for every index i £ {0, l} n ~ 2 . Let the last component 
of w be w\ = 1. 

From Lemma 2.1] this dependence must equal zero on every facet of the model B n and 
all the facets of B n are of the form [n — 2]\{j} where j £ [n — 2]. For each facet S := [n — 2]\{j}, 
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there is precisely only one other index that creates a column with a non-zero entry in its k = 1 
row of facet S and this index is 1 — ej. Furthermore, since each of these entries equals 1, 
then u>i_ ej must equal —1 for every j G [n — 2]. In other words, the index vectors i with 
1 ■ i = n — 3 must take the value —1 in w. Repeating this argument on each of the i with 
1 • i = n — 3, we get that w must also take the values of +1 on each of the positions indexed 
by i with 1 • i = n — 4. Repeating recursively we get the vector w = checker (up to possible 
sign change) as claimed. □ 

The checkerboard vector is so called because of the alternating +1/ — l's depending on parity. 
Next consider for each n > 4 the model T n := B n U {{n — 1}}. It is not too difficult to see that 





\ A{B n ) 


A{B n ) 


its matrix of margins is A(T n ) = 


1 










1 



where the columns are indexed by all 



the (i|0)'s first followed by the (i|l)'s. The bottom two row vectors come from the row matrix 
for the face {n — 1} of T n . Permitting an abuse of notation, we write T n in place of A(T n ). 

Next, let a be the following index subset of the columns of T n 

a = {(0|0), (0|1)} U {(i|0) : 1 • iodd} U {(ijl) : i + 0, 1 • ieven}. 



Then the submatrix of T n indexed in order by these columns is 



where r equals 1 in its first 2 n 3 entries and in the remaining 2 



n— 3 



eo 


eo 




1 





r 
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1-r 



1 entries, eo is the first 



column of A{B n ) and A(B n )-Q is simply A(B n ) with its first column indexed by removed. 

Lemma 2.5. For each n > 4 there is a unique relation on the columns of the matrix given 
by 



f„ 



2 n ~ 3 e 



5) + S 6 ( i l 1 ) 
i:i^0,lieven 



-m— 3 



lie 



(0|i) 



E • 

iiliodd 



S(i|0)- 



Proof. We first observe that rank(T™) = rank(A(B n )) + 1. In addition, has only one column 
more than A{B n ) and so the kernel of has dimension equal to 1. The unique relation f n (up 
to scalar multiplication) must also respect those relations on A(B n ) and so must respect the 
checkerboard relation of Lemma f2.4[ But then there are also the last two rows of to account 
for, forcing the coefficients of e(o|o) to be 2 n ~ 3 and of e( |i) to be 2 n ~ 3 — 1 as claimed. □ 

Corollary 2.6. The vector f n is a circuit (i.e. a minimal linear dependence) for the matrix 

-pn 



To complete Sullivant's construction, the logit model of T n is given by A n := logit(r n ) = 
{T U {n} : T E T n } U 2^ n ~ l \ The matrix A n for the model A n is the Lawrence lifting of 





' pn 







r n [T3] and equals A n = 





pn 


where / is the 2 n 1 identity matrix. Since A n is the 




I 


I 





Lawrence lifting of T n then [151 Chapter 7] we have the following property: g is an (integer) 
vector in the kernel of T n with g € M 2 ™ if and only if the lifted g := (g, — g) S M 2 " is in the 
(integer) kernel of A n . Consequently, if we have a vector g € M 2 ™ in the kernel of r n and if 
the support of g equals r C [2 n_1 ] then we denote the support of g by f C [2 n ] and we can 
also describe f as follows: if rj £ {0, l} n_1 then rj G r <J=^ both(?/|0), (r]\l) G f. Because of 
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the isomorphism g < — ► (g, — g) between the kernels of matrices for any model and its logit, 
we have the first part of the following corollary. The second part follows from the first. 

Corollary 2.7. (1) The lifted f n is a circuit for A n . (2) (from pH Prop. 4.11 & Thm. 7.1]) 

The lifted f n is a reduced Grobner basis element for A n . 

The lifted f n 's are precisely what Sullivant constructed and argued that these were reduced 
Grobner basis elements for A n by showing that they were Graver basis elements, which is a 
weaker condition than (1) above but still sufficient to be able to apply condition (2). Conse- 
quently, applying Proposition 12.11 to f n we get that gap_(A n ) > 2 n_3 — 1 and that this gap is 
given by the margin b := A n (u n — eo). In addition, from this margin b, other margins that 
provide the same size gap are easy to construct. In the next section, via standard pairs, we 
show that for each n > 4 all these margins are very rare. 

3. Rare Encounters With Large Gaps Via Standard Pairs 

In order to examine the frequency with which the (2 n_3 — l)-gap occurs we need the notion of 
standard pairs |16j . Let A be a fixed matrix with N columns and c a cost vector with N entries. 
If 7 £ N N and r C [N], we denote by (7, r) the set of vectors {7 + Xwe r n ' e ' ' Ul e ^J"' ^ e ca ^ 
7 the root of the pair and r the free directions of the pair. We say that the pair is associated 
for the family of integer programs IP^.c := {IP^ c (b) : b G N^4} if both (i) supp(7) n r = 
and (ii) every vector p G (7,t) is an optimal solution for IP^ c (Ap). Furthermore, if there 
does not exist another associated pair (j',t') with (7, r) C ipf' ,r') then we say that (7, r) is a 
standard pair for the family of integer programs. 

We will now prove Theorem 11.11 relying on the verification of propositions that will follow. 
As in the previous section, a denotes the support of f n , the Grobner basis element of A n which 
we showed was also a circuit of A n . As before, let the indices of the columns of A n be denoted 
by (i\l\l') where i G {0, l} n ~ 2 and 1,1' G {0,1}. Finally, let M(q) denote the set of margins 
{b : A n x = b and 1 • x = q}. Note that A n is graded i.e. the entries in each column of A n sum 
to n and so every margin b belongs to a unique M(q). 

Proof of Theorem 11.11 From Proposition 12.11 Sullivant's (2 n_3 — l)-gap was created by the 
margin b = A n (u n — e(o|o|o))- m the remainder of this section we will show the following: 

Claim: The margin b = A n (u n — e(o|o|o)) belongs to the image (under A n ) of the standard 
pair ((2 n_3 — 1) • e(oiolo); <5"\{(0|0|0)}-) and to no other standard pair. 

Note too that most elements from the standard pair ((2 n ~ 3 — 1) • efoiolo); o"\{(0|0[0)}) are 
(coordinate-wise) greater than u n — e( |o|o) an d it follows easily that these elements also create 
gaps of size 2 n_3 — 1. We will call the non-negative integer image under A n of this standard 
pair the Sullivant (2 n ~ 3 - 1) -margins. 

These margins are rare in the following sense. By the gradedness of A n each M(q) is 
contained in the lattice points of a slice of the cone cone(A n ). This cone has dimension 
2™~ 1 + 2 n ~ 2 |11| Theorem 2.6] and so each margin slice M(q) of this cone is a collection of 
lattice points in a polytope of dimension 2 n ~ l + 2 n ~ 2 — 1. On the other hand, the Sullivant 
(2 n ~ 3 — l)-margins all live in the shifted cone A n (u n — e( |o|o)) + cone (^\{(o|o|o)}) wn i c h ^ as 
dimension |<x| — 1 = 2(2 n_2 + 1) — 1 = 2 n_1 + 1. Consequently for each q, the (2 n_3 — 1)- 
margins sit in a 2™ _1 -dimensional slice of the (2 n_1 + 2™~ 2 — l)-dimensional M(q). Hence, the 
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(2 n_3 — l)-margins sit in a relatively very small slice of M(q) for every q and would thus, in a 
random uniform choice of margin from M(q), be rarely encountered. □ 

Before proving the central claim of the above proof there are a number of things to note from 
the above analysis of the (2 n_3 — l)-margins. A reasonable alternative approach to measuring 
the frequency of these (2 n_3 — l)-margins would be to ignore the standard pair analysis and 
instead ask how frequently these margins occur asymptotically, i.e. among all the margins 
with large 1-norms. In this case, regardless of the model S, most gaps are 0. This makes 
reasonable sense when phrased as the linear relaxation of an integer program has an integer 
solution for most right hand sides b when |b| ^> 0. A clean algebraic statement in terms of the 
Hilbert function of toric ideals can be seen in |15} Prop. 12.16]. However, given that released 
margins with large norms will come from tables with large cell counts (which are harder to 
bound tightly and are consequently more secure), the asymptotic results are not relevant thus 
justifying the need for the analysis above. 

Also note that since the cone is shifted significantly from the origin then the (2 n " 3 - 1)- 
margins only start to appear in M(q) slices for q > 2 n ~ 3 — 1 and so on contingency tables with 
mostly small counts, instances of these large gaps will never be encountered. Contingency 
tables with small cell counts are common in practice (see, for example, [3 [7]) which could give 
a further explanation as to why the large gaps are not encountered in practice. 

Finally, we noted that a random uniform choice of margin in M{q) is highly unlikely to 
pick out a (2 n ~ 3 - 1) -margin but there may be some prior distribution on the margins that 
is not uniform. A very reasonable assumption, based on sums being distributed normally, is 
that the margins that are most frequently encountered are those in the centre of cone(A n ) but 
in our instance the (2 n_3 — l)-margins appear in a shifted cone, shifted in a highly skewed 
fashion away from the centre of cone(A n ) along one of the extreme rays of that cone. So in 
fact, the uniform assumption in the proof of Theorem 1 1 . 1 1 may even be overly generous to the 
occurrence of the (2 n ~ 3 — l)-margins. 

We now turn our attention to proving the main claim in the proof of Theorem 11.11 We first 
need the following remark: 

Remark 3.1. Given any matrix A and a circuit w of A, every integer vector in the kernel of 
A with support contained in the support of w must be an integer multiple of w. In particular, 
if W\ > 1 then w — ei cannot be in the kernel of the matrix A. 

For our interests, where c := eo and A = A n , we can rename the family of integer programs 
as IPa",- and the optimality condition (ii) in the associated pairs as follows: 

(ii)_: $ both {ni £ N : I € r} and t £ N 2 ™ such that i(o|o|o) < P(o\0\0) an d 
A n p:= A n (7 + £ Zer n^) = A n t 

Proposition 3.2. The pair {k ■ e(0|0|0), (t\{(0|0|0)}) is an associated pair for IPa^,- for all 
1 < k < 2 n ~ 3 - 1. 

Proof. Clearly each pair satisfies condition (i) above so all we need verify is the rephrased 
condition (ii)-. We will first show that condition (ii)- holds for k = 2 n ~ 3 — 1. 

Let p = (2 n ~ 3 — 1) • e( |o|o) + Sze<r n i e i- K there were to exist a t such that A n p = A n t with 
£(o|o|o) < 2 n_3 — 1 then p — t would be an integer vector in the kernel of A n . Since both p and 
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t are non-negative, we can assume that their respective supports are disjoint. Since p is the 
positive part of the kernel element then an index element (rj\0) (or is in the support of p 

if and only if (77 1 1 ) (or (77 1 0) respectively) is in the support oft. Therefore, by the construction 
of f n and since supp(p) C a we must have supp(t) C a. 

But this cannot occur: if the set supp(t) were contained in a then supp(p — t) would also 
be contained in a with the (0|0|0)-entry of this integer vector being positive and less than 
2 n ~ 3 — 1, which using the fact that f n forms a circuit, would contradict Remark 13.11 Hence 
(ii)„ is satisfied and so ((2™ -3 — 1) • e(0|0|0), <r\{(0|0|0)}) is an associated pair. 

For the other values of A:, we know that for any family of integer programs, if we have a 
vector p that is an optimal solution and another non-negative integral vector p' with p' < p 
then p' is also an optimal solution for that family. This proves that we have an associated pair 
for the other values of k too. □ 

Note that nothing special was used here about the matrix A n , only that it was the matrix 
of margins for a logit model, so we have the following general result for identifying quick gaps 
for other logit models: 

Corollary 3.3. If t be a circuit for any model logit(5) with fo = a and a = supp(f). Then 
(k ■ eo, <t\{0}) is an associated pair for IPiogit(S),- f or all 1 < k < a — 1. 

The next proposition claims that each of the associated pairs from the previous proposition 
are in fact standard pairs. 

Proposition 3.4. The pair (k ■ e^ |o|o) ? <5"\{(0 10|0) }) is a standard pair for IP^™,- for all 
1 < k < 2 n ~ 3 - 1. 

Proof. It will suffice the consider the case of k = 1. By the previous proposition, we know that 
the pair is associated. Recall that we have a containment of associated pairs (7, r) C (7', r') if 
and only if 7' < 7 and supp(7 — 7') U r C r'. If 7 = 1 o| 0) an d r = <5"\{(0|0|0)} then such a 7' 
would equal 1 or e( |o|o)- If 7' = 1 then (0 1 1 0) S t' and (1, a) would have to be an associated 
pair, which cannot be the case since the non-optimal solution u n is in this pair. 

The other alternative is that 7' = e( |o|o) an< ^ i n this case we need to show there does not exist 
an I ^ a for which the pair (e( |o|o); <5"\{ (0 1 1 0) } U {/}) is an associated pair. Such Z's are of one 
of the following forms: (a) (i|l|0), (b) (i|0|l), (c) (i|0|0) or (d) (i|l|l) where / i G {0, l} n ~ 2 as 
before. Note that regardless of the value of i 7^ 0, we always have the relation A ra w + = A n w _ 
where w+ := e (0 |o|o) + e (i|i|o) + e (o|i|i) + e (i|o|i) and w ~ : = e (o|i|o) + e (i|o|o) + e (o|o|i) + e (i|i|i)- 

(a) By construction, I = (i|l|0) is equivalent to (i|0|0) and (i|0|l) both being elements of 
a. We claim that the pair ((0|0|0), cr\{(0|0|0)} U {(i|l|0)}) violates condition (ii)_. To 
see this notice that the choices of p = w + and t = w~ satisfy the following: p has 
support in a U (i|l|0), that A n p = A n t and = i(ololo) < J*(o|o|o) = 1 Hence, this is 
such a choice for p and t violating (ii)_. 

(b) The exact same choice of p and t can be made in this case as for part (a). 

(c) Consider the integral vector f n — (2 n_3 — l)(w + — w~). This vector is in the kernel of 
A n with (0|0|0)-entry equal to 1 and with positive support wholly contained in crU {/}. 
Letting p and t be the positive part and negative parts respectively of f n — (2 n_3 — 
l)(w + — w~) we have a violation of condition (ii)_. 
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(d) The exact same choice of p and t can be made in this case as for part (c). 

Thus we have shown the k = 1 case. For 2 < k < 2™~ 3 — 1 simply replace p and t by k ■ p and 
k ■ t respectively. □ 

In the last proposition we created a set of standard pairs that contained the optimal solutions 
u n — (2 n_3 — A;)e( |Q|o) for every 1 < k < 2 n ~ 3 — 1. We can now complete the proof of the 
central claim in Theorem ll.il 

Proposition 3.5. The optimal solution u n — e( |o|o) * s contained in precisely one standard 
pair, namely, ((2 n " 3 - 1) • e (O | | 0) , <t\{(0|0|0)}) 

Proof. The analysis is very similar to that which was carried out in Proposition 13.41 We need 
to show that for any I ^ a, there exists n\ 6 N such that u n — e( |o|o) + n i e i ^ s n °t optimal. 
Such Vs are of one of the following forms: (a) (i|l|0), (b) (i|0|l), (c) (i|0|0) or (d) (i|l|l) where 
^ i G {0, l} n ~ 2 as before. The kernel element w + — w~ of A n is as above. 

(a) By construction, I = (i|l|0) ^ a is equivalent to 1 • i being odd. If u n — e( |o|o) + n i e i 
were optimal for every n\ € N then every vector < z < u n — e( |o|o) + n i e i would also 
be optimal. But, since 1 ■ i is odd, then z = w + is such a vector and we already know 
that this vector is not optimal. 

(b) The case of / = (i|0|l) ^ a (with 1 • i even) can be argued as in part (a). 

(c) The next case is I = (i|0|0) ^ a with 1 ■ i being even. Here the index vector (i|0|0) is in 
supp(w~) and (i|0|l) is in supp(w + ). Similar to the proof of Proposition [331 part (c), 
let p and t be the positive part and negative parts respectively of f n — (w + — w~). In 
this case, p < u n — e( |o|o) + e i an d so it would need to be optimal if the free direction 
/ were to be allowed. However, = i(o|o|o) < P(o|o|o) = 2 n ~ 3 — 1 and so we cannot have 
I = (i|0|0) with 1 • i being even as a free direction in a standard pair that contains 

u n - e( O |0|0)- 

(d) The case of I = (i|l|l) ^ o with 1 • i being odd is the same as that made in part (c).q 



4. Closing Remarks 

Rather than asserting that linear programming is an effective heuristic for detecting disclo- 
sures when releasing margins of multi-way tables our result reopens this possibility, proposing 
that indeed large gaps in small hierarchical models do exist but may only rarely be encountered 
in practice. We have not addressed what happens for the other large gaps from Corollary 12.21 
that occur in the model A n . Nor have we addressed the extent to which the rarity encountered 
here happens for other hierarchical models. We attempt to address this by briefly reporting 
on some computational results. 

From Corollary 12.21 and the discussion preceding it there are standard pairs like those from 
Proposition l3. 41 whose respective images contain the /c-margins for 1 < k < 2 n_3 — 2 respectively. 
Using Macaulay 2 [8] we were able to confirm for n = 4 and n = 5 that while we did not have 
the uniqueness property of Proposition 13.51 for these fc-gaps it was the case that the standard 
pairs (7,t) for all of these margins had |f| = |<r| — 1. Hence, for n = 4, 5 the computational 
evidence suggests that each of these fe-gaps were contained in a 2 n ~ 1 -dimensional slice of the 
(2 n_1 + 2 n ~ 2 — l)-dimensional M(g)'s. Furthermore, they were all highly skewed along the 
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(0|0|0) ray in the same manner as discussed after the proof of Theorem II. II for the (2 n 3 — 1)- 
margins, again making these /c-margins unlikely to be encountered. 

Other instances of models with large gaps constructed from Proposition 12. II can be found in 
[2J Prop. 2.7]. The binary model there is the collection of all edges of the complete graph with 
n > 4 vertices and a Grobner basis element (with respect to (1,0)) is found there that provides 
a lower bound on the gap that grows linearly in n. In this case too the computational evidence 
using Macaulay 2 indicates that all margins attained from Proposition 12.11 occur rarely. In the 
course of this work the answer to the following question seemed to be "yes" and may be of 
independent interest for those interested in Markov moves (see for example (6[ Ch. 1]): 

Question 4.1. If g := u — v is a Grobner basis element for the matrix of margins for the 
model S then is it true that (1) |supp(g)| > rank(^4(«S)) implies that all entries of g belong to 
{ — 1,0, +1} ? (2) Given any gap arising from Proposition \2. 1\ is it true that its standard pair 
is always of the form (7, r) where r C supp(g) ? 

Thomas Kahle computationally verified that (1) is true for all models recorded at [12]. Note 
that if both (1) and (2) are true then every gap attained from Proposition 12.11 would be rare 
in the sense of Theorem 11.11 

Finally, the gaps coming from Proposition 12.11 are not the only way that gaps can arise. 
The gap can be computed precisely [10] by solving a collection of group relaxations [HI Ch. 
24] coming from the collection of standard pairs for IPa»,-- Using Macaulay 2, in the case of 
n = 5 there are 1280 such standard pairs (7, f) that need to be considered and 1013 produce a 
gap greater than 0. But when checked computationally for the n = 5 case each of the standard 
pairs that had gap greater than or equal to 1 were exactly those that had the number of free 
directions strictly less (& considerably less) than the rank of A 5 . Similarly for n = 4 and n = 5 
in the case of the model studied in [H Prop. 2.7]. 

In conclusion the computations using Macaulay 2 suggest that the results of Section 3 
may be the typical scenario, that the gaps provided from Proposition 12.11 may always be 
rare and furthermore that other gaps greater than or equal to 1 may be equally rare. Thus 
the computations lend further support to linear programming being an effective heuristic for 
detecting disclosures when releasing margins of multi-way tables. 
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